Underfitting vs. Overfitting

Underfitting

include more features
try a more complicated model
for Recurrent Neural Networks and Convolutional Neural Networks:
- find better network architecture and hyperparameters
- try longer/better optimization algorithms (Gradient Descent)

try a simpler model
collect more data (or training examples) for ML
Feature selection: select features to include/exclude, when there are multiple features
Regularization => reduce overfitting during estimation
for Recurrent Neural Networks and Convolutional Neural Networks: find better network architecture and hyperparameters

Cross-validation => test the model’s predictive accuracy on another sample
- Cross-Validation
Information criteria => construct a theoretical estimate of the relative out-of-sample Kullback-Leibler Divergence (Information theory and Entropy in Neuroscience#^aa57f9)
- #Akaike Information Criterion AIC
- Deviance Information Criterion (DIC): a more general version of AIC
- Widely Applicable Information Criterion (WAIC): even more general than AIC and DIC
- WAIC PSIS

BUT! Do not use predictive criteria to choose a causal estimate, because predictive criteria actually prefer confounds in Causal inference#^b7b0a6.